sparse rule
Information theoretic limits of learning a sparse rule
We consider generalized linear models in regimes where the number of nonzero components of the signal and accessible data points are sublinear with respect to the size of the signal. We prove a variational formula for the asymptotic mutual information per sample when the system size grows to infinity. This result allows us to derive an expression for the minimum mean-square error (MMSE) of the Bayesian estimator when the signal entries have a discrete distribution with finite support. We find that, for such signals and suitable vanishing scalings of the sparsity and sampling rate, the MMSE is nonincreasing piecewise constant. In specific instances the MMSE even displays an all-or-nothing phase transition, that is, the MMSE sharply jumps from its maximum value to zero at a critical sampling rate. The all-or-nothing phenomenon has previously been shown to occur in high-dimensional linear regression. Our analysis goes beyond the linear case and applies to learning the weights of a perceptron with general activation function in a teacher-student scenario. In particular, we discuss an all-or-nothing phenomenon for the generalization error with a sublinear set of training examples.
Review for NeurIPS paper: Information theoretic limits of learning a sparse rule
Summary and Contributions: The paper considers the generalized linear model, in the high-dimensional sparse setting. In this setting the authors prove the existence of an'all-or-nothing phenomenon' (see GZ, RXZ) wherein there exists a function m_*(n) (sublinear in the ambient dimension n) such that - if the # samples m (1 eps) m_*, the error in estimating the signal vanishes; and - if m (1-eps) m_*, the error converges to the'trivial' error of estimating from the prior. Classical statistical or learning theory typically demonstrates a'graceful' decay of error as the number of samples grows large. The current work is in a line of work demonstrating a'sharp cutoff' phenomenon instead. This was initiated by GZ, RXZ in linear regression setting.
Information theoretic limits of learning a sparse rule
We consider generalized linear models in regimes where the number of nonzero components of the signal and accessible data points are sublinear with respect to the size of the signal. We prove a variational formula for the asymptotic mutual information per sample when the system size grows to infinity. This result allows us to derive an expression for the minimum mean-square error (MMSE) of the Bayesian estimator when the signal entries have a discrete distribution with finite support. We find that, for such signals and suitable vanishing scalings of the sparsity and sampling rate, the MMSE is nonincreasing piecewise constant. In specific instances the MMSE even displays an all-or-nothing phase transition, that is, the MMSE sharply jumps from its maximum value to zero at a critical sampling rate.